Cross-lingual Discourse Relation Analysis: A corpus study and a semi-supervised classification system
نویسندگان
چکیده
We present a cross-lingual discourse relation analysis based on a parallel corpus with discourse information available only for one language. First, we conduct a corpus study to explore differences in discourse organization between Chinese and English, including differences in information packaging, implicit/explicit discourse expression divergence, and discourse connective ambiguities. Second, we introduce a novel approach to learning to recognize discourse relations, using the parallel corpus instead of discourse annotation in the language of interest. Our resulting semi-supervised system reaches state-of-art performance on the task of discourse relation detection, and outperforms a supervised system on discourse relation classification.
منابع مشابه
Semi-supervised Discourse Relation Classification with Structural Learning
The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a timeconsuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classificat...
متن کاملA Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension
Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based...
متن کاملشناسائی رابطه تقابل در گفتمان فارسی به کمک روش های یادگیری باسرپرستی
Discourse is a part of language that intend is used to communicate. A discourse relation recognition system can identify one or more relation between the textual units in a discourse. Like other languages, Contrast relation is a one of the available relations in Persian discourse. Contrast relation recognition in discourse is useful for generation and perception of discourse, paraphrasing and ...
متن کاملTowards Semi-Supervised Classification of Discourse Relations using Feature Correlations
Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing l...
متن کاملSemi-Supervised Representation Learning for Cross-Lingual Text Classification
Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification ...
متن کامل